IN THE CLAIMS: 

1 . (Currently amended) A method of selecting data sets for use with a predictive 
algorithm of customer behavior , comprising: 

generating a -first ^eoaraphieal distribution of a training data set for a predicting 
algorithm of customer behavior, said training data set being derived from a database 
containing customer information ; 

generating a second geographical distribution of a testing data set for said 
predictive algorithm of customer behavior, said testing data set being derived from said 
database containing customerjntorTnatton ; 

comparing the first geographical distribution and the second geographical 
distribution to identify a discrepancy between the first geographical distribution and the 
second geographical d istribution: and 

modifying selection of entries in one or more of the training data set and the 
testing data set based on the discrepancy between the first geographical distribution and 
the second geographical d istribution. 

2. (Currently amended) The method of claim 1 , wherein the first geographical 
distribution and the second geographical distribution are distributions of drive time from 
a customer geographical location to a commercial establishment geographical location. 

3. (Currently amended) The method of claim 1 , wherein the first geographical 

di stribution and the second geographical d istribution are distributions of distance between 
a customer geographical location and a commercial establishment geographical location. 

4. (Currently amended) The method of claim 1 , wherein comparing the first 
geographical distribution and the second geographical distribution includes comparing 
one or more of a mean, mode, and standard deviation of the first geographical distribution 
to one or more of a mean, mode, and standard deviation of the second geographical 
distribution. 
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5. (Currently amended) The method of claim 1 , wherein the first geographical 
distribution and the second geographical distribution are distributions of a weighted 
distance between a customer geographical location and commercial establishment 
geographical locations. 

6. (Currently amended) The method of claim L wherein the first geographical 
distribution and the second geographical distribution are distributions of a weighted drive 
time between a customer geographical location and commercial establishment 
geographical locations. 

7. (Original) The method of claim 1 , wherein modifying selection of entries in one 
or more of the training data set and the testing data set includes generating 
recommendations for improving selection of entries in one or more of the training data 
set and the testing data set, 

8. (Currently amended) The method of claim J „ whoroin th e further comprising the 
step of selecting said training data set and [[the]] said testing data set af e - sel e oted from a 
customer information database. 

9. (Currently amended) The method of claim 1 , further comprising comparing at 
feast one of the first geographical distribution and the second geographical distribution to 
a geographical distribution of a customer database. 

1 0. (Currently amended) The method of claim 1 , wherein the first geographical 
distribution and second geographical d istribution are frequency distributions of one of 
drive time and distance between a customer geographical location and one or more 
commercial establishment geographical locations. 

1 1 . (Currently amended) The method of claim 9, wherein comparing at least one of 
the first geographical distribution and the second geographical distribution to a 
geographical d istribution of a customer database includes: 
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generating a composite data set from the training data set and the testing data set: 

and 

generating a composite geographical distribution from the composite data set. 

12. (Original) The method of claim 1 , wherein modifying selection of entries in one 
or more of the training data set and the testing data set includes changing one of a ramSrcn 
selection algorithm and a seed value for a random selection algorithm. 

13. (Currently amended) The method of claim 1, further comprising training [[a]] md 
predictive algorithm for customer behavior using at least one of the training data set and 
the testing data set if the discrepancy is within a predetermined tolerance. 

14. (Currently amended) The method of claim 13, wherein the predictive algorithm 
for customer behavior is a discovery based data mining algorithm. 

1 5. (Currently amended) An apparatus for selecting data sets for use with a predictive 
algorithm s! 7 customer behavior , comprising: 

a statistical engine; and 

a comparison engine coupled to the statistical engine, wherein the statistical 
engine generates a first geographical distribution of a training data set of customer 
information and a second geographical distribution of a testing data se t of customer 
information , the comparison engine compares the first geographical distribution and the 
second geographical distribution to identify a discrepancy between the first geographical 
distribution and the second geographical d istribution and modifies selection of entries in 
one or more of the training data set and the testing data set based on the discrepancy 
between the first geographical d istribution and the second geographical distribution. 

1 6. (Currently amended) The apparatus of claim 1 5, wherein the first geographical 
distribution and the second geographical distribution are distributions of drive time from 
a customer geographical location to a commercial establishment geographical location. 
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17. (Currently amended) The apparatus of claim 15 ; wherein the first geographica l 
distribution and the second geographical distribution are distributions of distance between 
a customer geographical location and a commercial, establishment geographical location. 

1 8. (Currently amended) The apparatus of claim 15, wherein the comparison engine 
compares the first geo<zraphicaI distribution and the second geographical distribution by 
comparing one or more of a mean, mode, and standard deviation of the first geographical 
distribution to one or more of a mean, mode, and standard deviation of the second 
geographical d istribution. 

1 9. (Currently amended) The apparatus of claim 15> wherein the first geographical 
distribution and die second geographical distribution are distributions of a weighted 
distance between a customer geographical location and commercial establishment 
geographical locations. 

20. (Currently amended) The apparatus of claim 1 5 ? wherein the first geographical 
distribution and the second geo_graphical d istribution are distributions of a weighted drive 
time between a customer geographical location and commercial establishment 
geographical locations, 

21 . (Original) The apparatus of claim 1 5, wherein the comparison engine modifies 
selection of entries in one or more of the training data set and the testing data set by 
generating recommendations for improving selection of entries in one or more of the 
training data set and the testing data set. 

22. (Original) The apparatus of claim 1 5, further comprising a training data set/testing 
data set selection device that selects the training data set and the testing data set from a 
customer information database. 
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23. (Currently amended) The apparatus of claim 1 5, wherein the comparison engi ne 
funher compares at least one of the first geographical distribution and the second 
geographical distribution to a geographical distribution of a customer database. 

24. (Currently amended) The apparatus of claim 15. wherein the first geographical 
distribution and second geographical distribution are frequency distributions of one of 
drive time and distance between a customer geographical location and one or more 
commercial establishment geographical locations. 

25. (Currently amended) The apparatus of claim 23, wherein the comparison engine 
compares at least one of the first &e_o graphical distribution and the second geographical 
distribution to a geographical distribution of a customer database by: 

generating a composite data set from the training data set and the testing data set; 

and 

generating a composite geographical distribution from the composite data set. 

26. (Original) The apparatus of claim 15, wherein the comparison engine modifies 
sel ection of entries in one or more of the training data set and the testing data set by 
changing one of a random selection algorithm and a seed value for a random selection 
algorithm. 

27. (Original) The apparatus of claim 15, further comprising a predictive algorithm 
device, wherein the predictive algorithm device is trained using at least one of the 
training data set and the testing data set if the discrepancy is within a predetermined 
tolerance. 

28. (Original) The apparatus of claim 27, wherein the predictive algorithm is a 
discovery based data mining algorithm. 
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29. (Currently amended) A computer program product in. a computer readable 
medium for selecting data sets for use with a predictive algorithm of customer behavior , 
comprising: 

first instructions for generating a first geographical distribution of a training data 
set derived from customer information : 

second instructions for generating a second geographical distribution of a testing 
data se t derived from customer information ; 

third instructions for comparing the first geographical d istribution and the seccmd 
geographical d istribution to identify a discrepancy between the first geographical 
distribution and the second geographical distribution: and 

fourth instructions for modifying selection of entries in one or more of the 
training data set and the testing data set based on the discrepancy between the first 
geographical distribution and the second geographical distribution. 

30. (Currently amended) The computer program product of claim 29, wherein the first 
geographical distribution and the second geographical distribution are distributions of 
drive time from a customer geographical location to a commercial establishment 
geographical location. 

3 1 . (Currently amended) The computer program product of claim 29, wherein the first 
geographical distribution and the second geographical distribution are distributions of 
distance between a customer geographical location and a commercial establishment 
geographical location. 

32. (Currently amended) The computer program product of claim 29, wherein the 
third instructions.for comparing the first geographical d istribution and the second 
geographical distribution include instructions for comparing one or more of a mean, 
mode, and standard deviation of the first geographical d istribution to one or more of a 
mean, mode, and standard deviation of the second geographical^ i slributinn , 
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33. (Currently amended) The computer program product of claim 29, wherein the 
first geographical distribution and the second geographical distribution are distributions 
of a weighted distance between a customer geographical location and commercial 
establishment geographical locations. 

34. (Currently amended) The computer program product of claim 29, wherein the fkst 
geographical distribution and the second geographical distribution are distributions of a 
weighted drive time between a customer geographical location and commercial 
establishment geographical locations. 

35. (Original) The computer program product of claim 29, wherein the fourth 
instructions for modifying selection of entries in one or more of the training data set and 
the testing data set include instructions for generating recommendations for improving 
selection of entries in one or more of the training data set and the testing data set. 

36. (Currently amended) The computer program product of claim 29, further 
comprising fifth instructioxis for comparing at least one of the first geographical 
distribution and the second geographical distribution to a geographical d istribution of a 
customer database. 

37. (Currently amended) The computer program product of claim 29, wherein the first 
geographical distribution and second geographical d istribution arc frequency distributions 
of one of drive time and distance between a customer geographical location and one or 
more commercial establishment geographical locations. 

38. (Currently amended) The method of claim 36, wherein the fifth instructions 
include: 

instructions for generating a composite data set from the training data set and the 
testing data set; and 

instructions for generating a composite geographical distribution from the 
composite data set. 
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39. (Original) The computer program product of claim 29, wherein the fourth 
instructions for modifying selection of entries in one or more of the training data set ad 
the testing data set include instructions for changing one of a random selection algorithm 
and a seed value for a random selection algorithm. 

40. (Currently amended) The computer program product of claim 29, further 
comprising fifth instructions for training a predictive algorithm of customer behavior 
using at least one of the training data set and the testing data set i f the discrepancy is 
within a predetermined tolerance. 
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